Skip to content

Fix ARG_NOT_SET deserialization for _date fields in OperatorSerialization#66564

Open
hwang-cadent wants to merge 2 commits intoapache:mainfrom
hwang-cadent:fix-arg-not-set-deserialization-date-fields
Open

Fix ARG_NOT_SET deserialization for _date fields in OperatorSerialization#66564
hwang-cadent wants to merge 2 commits intoapache:mainfrom
hwang-cadent:fix-arg-not-set-deserialization-date-fields

Conversation

@hwang-cadent
Copy link
Copy Markdown
Contributor

Summary

OperatorSerialization._deserialize_field_value previously routed every *_date field directly to _deserialize_datetime. When an operator stores a date-suffixed field as NOTSET (an ArgNotSet sentinel meaning "use the default at runtime"), the value is encoded as {__type: ARG_NOT_SET}. Passing that encoding to _deserialize_datetime raises because it is not a datetime payload, so deserializing such an operator (for example TriggerDagRunOperator whose logical_date defaults to NOTSET) fails.

This PR short-circuits the ARG_NOT_SET case on date-suffixed fields and restores the NOTSET singleton. None values and real datetimes keep their existing behavior.

Why a separate PR

This was originally bundled inside #56973 (dynamic dag_id resolution in TriggerDagRunOperator links). Per reviewer feedback from @potiuk, splitting it out keeps each concern independently reviewable, bisectable and backportable. #56973 is being rebased to drop the serialization change once this PR lands.

Changes

  • airflow-core/src/airflow/serialization/serialized_objects.py: handle ARG_NOT_SET in _deserialize_field_value for date-suffixed fields.
  • airflow-core/tests/unit/serialization/test_serialized_objects.py: add test_deserialize_field_value_with_arg_not_set_for_date_fields covering logical_date, start_date, end_date.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Testing

``OperatorSerialization._deserialize_field_value`` previously sent every
``*_date`` field to ``_deserialize_datetime`` unconditionally. When an
operator stores a date-suffixed field as ``NOTSET`` (an ``ArgNotSet``
sentinel meaning "use the default at runtime") the value is encoded as
``{__type: ARG_NOT_SET}``. Passing that encoded value to
``_deserialize_datetime`` fails because it is not a datetime payload, so
deserializing such an operator (e.g. ``TriggerDagRunOperator`` whose
``logical_date`` defaults to ``NOTSET``) raised an error.

This change short-circuits ARG_NOT_SET on date fields and restores the
``NOTSET`` singleton, leaving ``None`` and real datetimes unchanged.

Adds a direct unit test in ``test_serialized_objects.py`` covering
``logical_date``, ``start_date`` and ``end_date``.
@hwang-cadent hwang-cadent force-pushed the fix-arg-not-set-deserialization-date-fields branch from e1d1f5a to f2b465e Compare May 9, 2026 13:51
Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

Small, focused fix in OperatorSerialization._deserialize_field_value. When a _date-suffixed field round-trips through serialization with the value {Encoding.TYPE: DAT.ARG_NOT_SET}, the code now returns the NOTSET sentinel instead of feeding the dict to _deserialize_datetime (which fails). One unit test verifies the fix for logical_date / start_date / end_date. Real-world repro is TriggerDagRunOperator.logical_date=NOTSET.

Code correctness — questions

1. Why scope the fix to _date fields only?

The existing _deserialize_field_value branches are: resourcesfrom_dict, *_date_deserialize_datetime, otherwise → return-as-is. The fallback branch would return {Encoding.TYPE: DAT.ARG_NOT_SET} as the literal dict, not as NOTSET. So the same class of bug exists for every non-date field that an operator stores as NOTSET. If TriggerDagRunOperator has only a date-typed NOTSET-able field today, the scope is technically OK — but the fix is fragile (next operator using NOTSET on a string field would silently get a dict back).

A safer shape: hoist the ARG_NOT_SET check above the field-type dispatch.

if isinstance(value, dict) and value.get(Encoding.TYPE) == DAT.ARG_NOT_SET:
    from airflow.serialization.definitions.notset import NOTSET
    return NOTSET
# ...existing field-name dispatch

That makes ARG_NOT_SET symmetric with how it's serialized (a generic encoding), regardless of field type.

2. The symmetry assumption isn't tested

The test only proves the deserialize side handles the encoding. There's no test that actually serializes TriggerDagRunOperator(logical_date=NOTSET) and confirms the encoding emitted by the serializer is the one this branch matches. A round-trip test (serialize → deserialize → identity) would catch the case where the serializer changes its NOTSET encoding shape (e.g. wrapping in {Encoding.VAR: ..., Encoding.TYPE: ...} instead of the bare {TYPE: ARG_NOT_SET} form the test mocks).

3. endswith(\"_date\") is broad

Matches logical_date, start_date, end_date, plus any future operator field a contributor names with that suffix that isn't actually a datetime (e.g. update_date_str, last_seen_date_iso). Not a regression — just inheriting the existing behavior — but worth noting if you do the hoist above (the check then becomes type-agnostic and you don't widen this surface further).

Test coverage

  • Unit test covers the three current fields. Good.
  • Missing: round-trip test (operator instance → SerializedDAG.serialize_dagSerializedDAG.deserialize_dagassert .logical_date is NOTSET). That's the test that would have caught the bug at PR-time and would catch the symmetry break.
  • Missing: parametrize across the three fields rather than a for loop so a failure in one shows the failing case clearly in pytest output.

Style / conventions

  • Inline from airflow.serialization.definitions.notset import NOTSET inside the method — consistent with how Airflow handles potential circular imports in the serialization module. Fine.
  • Comment is well-written and names the concrete failure case (TriggerDagRunOperator). Good.
  • Docstring on the test explains the scenario clearly. Good.

Other

  • No newsfragment. This fixes a user-visible bug (operator with logical_date=NOTSET would previously fail to deserialize) — likely warrants a *.bugfix.rst newsfragment.
  • No mention of when the regression landed in the PR description. Helpful for downstream maintainers deciding whether to backport to v3-2-test/etc.

Verdict

The fix is correct for the demonstrated case and the test pins the behavior. Two recommendations before approving:

  1. Hoist the ARG_NOT_SET check above the field-name dispatch (~3 lines moved) so any future field type is covered, not just _date.
  2. Add a round-trip test through SerializedDAG so the serialize/deserialize symmetry is the thing under test (not just the deserialize half).

Add a newsfragment in either case. Otherwise LGTM as a tightly-scoped bug fix.

Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes until responses are given.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants